Neko is an open-source maintainer focused on cloud-native tooling, best known for the Ollama Operator, a Kubernetes controller that streamlines deployment, scaling and lifecycle management of large language models by wrapping the Ollama inference engine in familiar kubectl semantics. The project targets ML engineers, DevOps teams and hobbyists who want to run Llama, Mistral, CodeLlama and similar models inside existing clusters without hand-writing StatefulSets, PVCs or Services; a single custom-resource YAML declares the desired model, resource limits and number of replicas, and the operator handles image pulls, pod placement, horizontal scaling and rolling updates automatically. Typical use cases include internal chatbots for enterprise knowledge bases, GPU-accelerated coding assistants attached to CI pipelines, and low-latency summarization micro-services that autoscale during business hours. Because manifests are plain Kubernetes objects, they slot cleanly into GitOps workflows, letting platform teams version, review and promote model configurations across dev, staging and production the same way they ship application code. Metrics and events are exposed through the standard Prometheus and Kubernetes APIs, so administrators can wire cost controls or quota policies around GPU hours. Neko’s software is available for free at get.nero.com; the latest build is fetched through trusted Windows package sources such as winget, installs silently, and can be batched alongside other applications in one command.

Ollama Operator

Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫

Details